IUPHAR/BPS Guide to Pharmacology (Pawson et al. 2014). Finally, the protein-
protein interaction data, disease gene data, and drug target were combined to
calculate the network distance between all the drugs and a given disease (Cheng
et al. 2018).
4.3.4
Chemical Similarity
The chemical similarity ensemble approach (Keiser et al. 2007) compares target
proteins by using the chemical similarity of the ligands that bind to them, represented
as e-values, adapting the basic local alignment and search tool algorithms (Altschul
et al. 1990; Hert et al. 2008). The structural similarity between each drug and each
target’s ligand set was quantified as an e-value using the similarity ensemble
approach (Keiser et al. 2007). It can be used to quickly search large ligand databases
and to identify similarity maps among target proteins in large scale. The method is
different from traditional bioinformatics methods for identifying similarity between
proteins that uses the sequence of amino acids or three-dimensional structural
similarity among target proteins. A total of ~3600 drugs were compared against
~65,000 ligands organized into 246 targets from the MDL Drug Data Report
database
(Schuffenhauer
et
al.
2002),
generating
0.9
million
drug-target
comparisons. Most of the drugs had no significant expectation values to most of
the ligand sets. Along all possible pairs of drugs and ligand sets, ~6900 pairs of
drugs and ligand sets were similar, with significant e-values. Predicted off-target
proteins with strong similarity ensemble expectation values are evaluated for novelty
using the literature.
4.4
Summary
Thanks to the emerging innovations in technologies, the low-cost sequencing and
high-throughput technologies are resulting in the generation of a massive number of
genomic datasets in biology and medicine. Currently, there are a large number of
candidate disease genes identified through GWAS and other approaches. Massive
data on single cell transcriptomics is enabling us to precisely identify the cell types
and associated gene expression signatures involved in different diseases. There is a
growing amount of data on FDA-approved drugs to treat the disease and several
other drugs which are not toxic to humans but failed to treat the diseases. Integrating
the
current
datasets
on
single
cell
transcriptomic,
genotype-phenotype,
pharmacogenomic, protein-protein interactions and pathways could ultimately result
in identifying drug action mechanisms, disease mechanisms, and new uses of
existing drugs. However, the current methods to deal with the massive amount of
high-dimensional genomic (big data) data are very limited. There is a need to
develop new statistical and computational methods to deal with rapidly growing,
high-dimensional, and heterogeneous genomic datasets and use these methods for
drug repurposing.
4
Computational Methods for Drug Repurposing
45